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THF: EFFECTS OF ITEM ORDER AND /^NXIETY 
ON TEST PERFORMANCE AND STRESS 

Introduction 

It is a generally accepted practice for test constructors 
to arrange the items in a test in order of increasing difficulty. 

The rationale behind this practice is quite simple— it increases 
the probability that an examinee will succeed on the early items 
and thereby gain confidence for the more difficult items later in 
the test. However tests are not always constructed in this way. 

For example, to reduce the chance of cheating, examiners sometimes 
make the order of presentation of items in a test different for 
different examinees. There is some evidence that the order of items 
in a test has an effect on performance. 

MacNicol (1956) found that when items were ordered from 
difficult-to-easy, the mean number of correct answers on the test 
was significantly lower than the mean number of correct answers 
obtained when the items were ordered in one of two other ways: from 
easy to difficult and at random. There was no appreciable difference 
between average performance on the easy- to-diffi cult and the random 
orders. These results were obtained for a test administered under 
essentially power conditions. 

One explanation of this phenomenon is suggested by Flaugher, 

Melton and Myers (1966). Tliey found that when easy items appeared 
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later in a test they were not reached by some subjects. In other 
words, if the test is speeded, it is clear that the difficult-to-easy 
order would disadvantage slow students since they would not have a 
chance to answer the easier items. This explanation is inadequate 
for MacNicol's results however since her test was administered under 
power conditions. 

Another possible explanation has been offered by Mollenkopf 
(1950) . He argued that fatigue and pressure to finish could account 
for poorer performance on items when they appeared later in the test 
than when they appeared earlier in the test. 

Another and perhaps more interesting possibility is that 
personality characteristics of individual subjects hinder their 
performance of items in the difficult-to-easy order. One such 
personality characteristic which has been found to influence test 
performance is anxiety. This is a variable which has been studied 
extensively in test situations (I. 0. Sarason, 1960; Ruebush, 1963) 
but apparently never in connection with item order. 

Test anxiety is considered to be specific anxiety associated 
with test situations. It is measured by instruments such as the 
Alpert-Haber Achievement Anxiety Test (1960). This type of anxiety 
is generally found to be negatively related to test performance 
(Alpert and Haber, 1960; Carrier and Jewell, 1966; Grooms and Endler, 
1960; Handler and Sarason, 1952; 1. G. Sarason, 1956b, 1957, 1959a, 
1963). However the size of the correlation between test anxiety and 
test performance depends on the testing situation. Results of a study 
by I. G. Sarason (1961) found stronger negative correlations between 
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test anxiety and aptitude? test scorer, t; an between test anxiety and 
grade point averay.es. The aptitude scores were obtained from tests 
administered in large group sessions and were to support applications 



to college, a hignly resired goa.! , sTn the other }iand, grade point 
averages v/ore baseu on cLissroojn tests administered over the course 
of a semester and any one test H'ould r-c unlikely to be perceived as 
important. Thus 5arason’s evidence seems to suggest that the more 
important a test seems to the student , the greater the negative 



correlation between test anxiety and perfortiUince, 



It .should be observed at this point that anxiety is a word 
used in two ways: to refer to a personality trait and to refer to 

a transitory state. Studies by Cattell and Scheier (19SR, 1961) 
suggest that anxiety quest ionnaire.s measure a relatively stable and 
permanent personality trait of the individual while physiological 
indicants of anxiety such as heart rate and palmar sweat measure a 
transitoiy state oi the individvial whi ch fluctuates over time. This 
transitory state has been referred to in the literature as arousal 



or stress (Spielbcryer, 1%6). 



Further evidence that anxiety questionnaires measure a relatively 
permanent personality trait was provided by SKiith (1965); he found that 
the characteristic ievci of questiormaire anxiety was unaffected by 
the stress conditions of the test udministrat ion. 



The thcoi*y di atinguishlng trait and state anxiety holds that 
individuals with high anxiety scores as measured by a questionnaire 
are not anxious all the time; however, such individuals are more likely 
to emit anxiety responses than less anxious individuals in personally 
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threatening situations such as tests. These anxiety responses 
interfere with task-relevant activities and lead to a subsequent 
reduction in performance level. Anxiety responses include heightened 
physiological activities (e.g. heart rate and sweating) and self- 
effacing statements (e.g. "I can’t pass this test.”). 

Preliminary support for this theory came from results reported 
in learning experiments. I. G. Sarason (1956a, 1958) found that 
under certain instructional conditions, low anxious subjects were 
superior in performance to high anxious subjects, yet under different 
instructional conditions there were no differences. 

Supporting evidence for the trait -state theory is found also 
in the literature of anxiety and test performance. Many studies 
suggest that individuals obtaining high scores on anxiety questionnaires 
differ from other individuals in the extent to which their performance 
is disrupted under conditions of stress (I. G. Sarason, 1957, 1959b). 
Typically the stress has been created by verbal instructions— e.g. infoim- 
ing the subject he is about to take an intelligence test. Wright sman 
(1962) found that when a test was seen as important, the scores of 
anxious subjects were significantly lower than those of non -anxious 
subjects. When the test was seen as unimportant, anxiety was unrelated 
to performance. 

Similar findings were reported by Sarason and Palola (1960). They 
found that under neutral or reassuring test instructions (informing the 
group that they were involved in a research project in which their function 
was to evaluate the test) high test anxious subjects did not differ 
from low test anxious subjects in performance. However when the test was 
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admirdstered under 5>tre>sfi’i 



c on d i t i n s ( i r. f o 1 n 2 t h e group that 
the test had bcex) founu 10 preaict course grades , ‘usccess In later 
life and even personnl 1 •. /) tl;C low te^t anxlnts subjects scored 
significantly higher. 

'Fhese previous findings serve not. only tc support the 
trait-state theory of anxiety i-ut they point out also the use of 
test directions to vary the stress of a test situation. Another 
test characteristic which could have an effect on the stress of the 
testing situation is the order of presentation of items. 



Objectives 



Tlie review of the literature in the previous section indicated 
that item order, stress of the test situation, and test anxiety have 
an effect on test performance. This study was de.signed to investigate 
the relationships among these three variables. 

The first objective of t'h.ls study was to investigate under 
power testing conditions the effect of item order on test performance. 
It was expected that a difficu.1.t-to-easy order of items would prove 
to be more difficult than would the reverse order. This aspect of 
the study consists of an attempt to replicate the findings of MacNicol 
(1%6) in {f different content domai n— mat hemat i cs . 



In- order to attain this first objective, a standardized 
mathematics achievement test was pre.sontcd to a group of high school 
students. One groirfi was given the test with the items ordered from 
easy-to-dif ficult while the second group was given the test with the 
items in reverse order. Perforniance scores for males and females were 
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compared under the two arrangements. 

The second objective of this study was to investigate the 
effect of item order on the stress induced in a test situation. The 
question asked was whether the stress of a test situation could be 
increased merely by changing the order of presentation of a set of 
test questions. It was hypothesized that a difficult-to-easy order 
of items would generate more anxiety responses and result in a higher 
level of stress for the subjects than would an easy-to-difficult 
order of items, A test of this hypothesis was made by measuring a 
physiological indicant of stress several times during the testing 
session. The groups working the items in different orders were compared 
on their average level of stress. 

The third objective of this study was to investigate the 
interaction of item order and anxiety. The question asked was whether 
there was any difference in performance of the high and low test anxious 
subjects on the two arrangements of test items. On the assumption 
that a difficult-to-easy order of items leads to a more stressful test 
situation than the reverse order, it was expected that the difference 
in performance between the high and low test anxious subjects would 
be greater on the difficult-to-easy order than on the reverse order. 
Anxiety scores were obtained by adsuinistering a standardized anxiety 
questionnaire. 
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Method 



Test Anxiety, Stress, and Achievement Measures 

Test an xiety measur e* The Achievement Anxiety Test (AAT) 
was used in this study to obtain a measure of test anxiety. This 
instrument was developed by Alpert and Haber (1960). It consists 
of two independent scales: a facilitating scale of nine items and 

a debilitating scale of ten items; Items on the facilitating scale 
are of the form— "Anxiety helps me to do better during exams and 
tests," while items on the debilitating scale are of the form— 

•’Anxiety interferes with my performance during examinations and tests." 
Alpert and Haber (1960) state that the two scales have both undergone 
numerous revisions based on the results of item analyses, validity 
studies and theoretical reformulations. The test-retest reliabilities 
for each scale over a ten week period are reported to be about .85. 

The two scales were combined into one questionnaire with the 
odd numbered items being from the debilitating scale and the even 
numbered items from the facilitating scale. 

Stress measure. A physiological indicant of stress, heart 
rate, has been reported to be one of the best indicators of stress 
(Cattell and Scheier, 1961). In this study, heart rate was measured 
using a pulsemeter. 

The pulsemeter (produced by Fraser Sweatman Incorporated, 
Pennsylvania) is transistorized and battery operated, and provides an 
instantaneous reading of the pulse. The pulsemeter has a range of 
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30 to 200 beats per minute with a needle indicator dial calibrated 
to provide quick and precise readings. The subject places a finger 
in a pressure sensitive device which measures pulse rate and displays 
it on the dial. 

The ease with which the pulse rate can be measured makes the 
pulsemeter a useful piece of equipment. In this study it was possible 
to obtain the heart rate repeatedly during the administration of a 
test with a minimum of disruption to the subject. 

Achievement measure . The achievement test used in this study 
was the Cooperative Mathematics Test Algebra II, Form B @ 1962 ETS 
Princeton, N.J. Of the 40 items in the test, 30 were selected for 
use in this study. The chosen items covered topics taught in the 
Grade 11 Mathematics Course in Ontario, 

The mathematics items were pretested on 250 students in two 
high schools in Toronto to obtain an index, of difficulty for each 
item. Using these indices, two forms of the test were produced: Form I 

consisted of items arranged in order from easy to difficult; in Form II 
the items were arranged in the reverse order. 

Subjects 

The subjects were 106 eleventh grade mathematics summer school 
students from two secondaiy schools in Toronto, Ontario. This total 
represented 100% of the summer school enrollment in mathematics in 
the two schools. 

The incomplete data of subjects who missed one of the two 
testing sessions was used wherever possible. 
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ProcoJtiTc 



An xiety t<}s t .nub j oot 

the AAT was a cjucsti onnaire to find out ho^' they 
They were assured chat tlu;ir ansv/ers would not be 



s were told that 
felt about tests, 
given to their 



teachers and that their scores would not bo placed in their school 



record. 



The AAT was administered by a researcher during a class 
period, two weeks prior to the administration of the achievement test. 
The students were unaware that their questionnaire results would be 
used in conjunction with the results of any other tests. 

The instructions for administering the questionnaire were 
adapted from the original directions of Alport and Haber (I960). Most 
students were able to complete the questionnaire in about 15 minutes; 

slower students were granted additional time in order that everyone 
would finish. 



test a d ministration . The 32-page mathematics 
test was printed in booklet form. Fach page in the booklet consisted 
of a 353 X Sh inch sheet of paper. The first two pages in the booklet 
included space for student identification and general test directions 
The remaining 30 pages contained the test quvestions. The subjects 
were given 40 minutes in which to complete the test. 



To ensure maximum effect of item order on test performance it 
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was considered essential that every subject work through the items 
in order. To ensure that this was done, three steps were taken: 
students were instructed to try the items in order; only one item 
appeared on each page of the test; and written instructions were 
given at the bottom of each page for the subject to go on to the 
next question. 

Subjects were randomly assigned to one of two treatment 
groups. Subjects in one group were administered the standardized 
mathematics achievement test with the items ordered from easy-to-difficult 
(Form I). Subjects in the second group took the same test but with the 
order of items reversed (Form II). The subjects were not aware that 
there were two versions of the test. 

To observe any effects item order might have on the stress of 
the test situation, it was considered essential that the subjects 
perceived the test as being important. To ensure that the subjects 
were properly motivated, the following statement was read immediately 
before the general directions for the achievement test: ’’This morning 

you are going to take a Grade 11 Achievement Test in Mathematics. Your 
results on this test will be sent to the school, so that your teacher 
can use this information in arriving at your final grade." 

The administrators in this part of the study were different 
from the one used to administer the anxiety questionnaire to reduce 
the likelihood of an association being made between the two tests. 

Stress measurement * At the outset of the experiment, even 
before the motivating instructions for the achievement test had been 
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retd, the subjects were told that they would undergo repeated pulse 
rate measurements before, during and after the test. 

After the test instructions had been read and immediately 
before a stibject started the test, his pulse rate was measured. 
Additional pulse readings were taken at the 10, 20, 30, and 40 minute 
marks of the test. The last reading was made as the test was taken 
from the student. At the same time the pulse reading was taken, the 
item the subject had reached on the test was recorded. 

Scoring the materials . The answer sheet for the AAT provided 
five choices for each question. These choices, of which the subject 
was to choose one, were "Never", "Sometimes", "About half the time", 
"Frequently", and "Always". For the purpose of scoring, these choices 
were given numerical values of 0, 1, 2, 3, and 4 respectively. The 
score for each item was totalled with scores for other items on the 
same scale to arrive at two scores— a debilitating anxiety score and 
a facilitating anxiety score. 

In the Mathematics Achievement Test, the subjects answered 
the questions by circling one of the five choices available. Subjects 
responses coded 0, 1, 2, 3 or 4 were key-punched on to IBM cards and 
scored on an IBM 7094 computer. Since the directions told students 
to guess, no correction for guessing was employed. 

Statistical Derivation of Stress Scores 

It was reported in a previous section that five pulse readings 
were taken on each subject: before he started the test, and then at 
the 10, 20, 30, and 40 minute marks of the test. The last reading 
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was nsade it;; the tent .var. taVen froip. the student, 

‘lie ;-i', re.S'. . .'0V(' t'rr end' subject wns found by avornying 
the sceosiU, rhiv.:, ir.d faurch nulse rend.lni.is (the <iu'rin|»-tcsT 
rofid'^ieSi. Jo •*. in ;u 1 caser? except when t'r.e suliject 1-ru! 

the tc*^ t nc’ore his I’llrd or fourth, rcrid-inji va.s taken. Tn 
the case where th.e subject finhihcd the test before the thii*d rcacUn:,! 
'vr*i- iftkcn, his ^trv.-is score wa.s '-urnnly his pulse readir.^ at th.o JO 
irtimste raark. h\ the more frenuent case where a subject finislicd the 
test after the t^u rd .nnc! before the fourth reading, his .stress score 
was the averaiic of his second and third pulse readinejs. This procedure 
of arriving at stress scores was adopted in an attempt to remove the 
effect of extj\«meoii 3 factors affecting stress .scores for those wlio 
finl.:dvcd the test early. 




Results 



Order of Test I tens 

For each' form of the test a plot was Made of the itea 
difficulty level versus the item position in the test. The results 
revealed that except for a few nisplaceaents» the majority of items 
on each form were in the desired order. The rank correlation between 
item position in the test and the position the item should have bean 
in» based on the item difficulty level as estimated from the data of 
this study » was .71 for the easy-to-difficult order and .52 for the 
difficult-to-easy order. 

Effect of Item Order on Test Perfoxmance 

Out of 30 questions » the sid>jects who took the easy-to-difficult 
font of the test (N » 55) averaged 11.41 correct answers. The subjects 
taking the difficult-to-easy fora of the test (N • 51) averaged 9.96 
correct answers. 

The test for the significance of differences in test perfonance 
under the two item orders was carried out using a two factor analysis 
of variance design (Lindquist, 1953, pp. 127-132). The two factors 
were item order and sex. Sex was introduced as a factor in the design 
to make the test for the significance of the item-order effect more 
sensitive. Because the assignment of students to test foxas was done 
at random, and because there were more boys than girls in the sample, 
the nisd>er of observations per cell of the analysis of variance table 
varied from 19 to 32. A modification of the analysis of variance 
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procedure was used to take into account the unequal numbers (Winer, 

1962, pp. 222-224). 

The analysis of variance is summarized in Table 1. The main 
effect due to item order was significant at the .05 level. The main 
effect due to sex did not reach the conventional level of significance 
although it approached significance (.05 < p < .10). There was no 
interaction between item order and sex. 

The conclusion that may be drawn on the basis of this analysis 
is that scores on the easy-to-difficult order were on the average 
significantly higher than scores on the difficult-to-easy order and 
this difference was independent of the sex of the examinee. 

A dieck was made to see whether the difficult-to-easy form of 
the test was more speeded than the easy-to-difficult form since if it 
was, it would be possible to explain the observed difference in performance, 
at least in part, by the failure of subjects working the difficult-to- 
easy test form to attempt the easy items appearing later in the test. 

An analysis of subjects* responses revealed that 8 out of 55 sid>jects 
did not complete the easy-to-difficult form, whereas 13 out of 51 
sid>Jects did not finish the difficult-to-easy form. (The difference in 
the number of sid>jects completing each form was not statistically 
significant when tested by a test for contingency : * 2.00, 

d.f. » 1, p > .15.) A more detailed analysis of the data for the 
subjects who did not complete the test revealed the following: on the 
easy-to-difficult form the 8 subjects who failed to finish did not 
attempt a total of 38 items; on the difficult-to-easy form the 13 subjects 
who did not finish left a total of 71 untried items. (An untried item 
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TAIiLH 1 

AN’ALYSIS OF VAUIAKC-P. i'ABLF. FOK 
MATliaMATICS TIAST SCOiUlS 
(Item Order X Sex) 
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>Joto.--The nun-iber of subjects in each cell of the design varied from 
10 to 32 . 



*Significant at tlje ,05 level. 
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was dofined as an itea not reachad by a subject and is indicated by 
tha fact that the sid)jact had not reached the page of the test on which 
the itea qtpaared. Because the iteas appeared in the test, one per 
page, it was possible to detemine quite accurately which iteas were 
not reached. Although almost twice as many iteas were not reached 
on the difficult-to-easy order as on the reverse order, it is unlikely 
that this difference affected the obtained results. The argvaant 
that nay be advanced in support of this assertion is the following: 
the average difficulty of the iteas not reached on the easy-to-difficult 
order (as estiaated from data on the group who finished the easy-to- 
difficult fora) was .25. Had the 8 subjects perfoxaed to the average 
level of the group finishing the fora, the average score on the easy- 
to-difficult order would have been increased from 11.41 to 11.59. On 
the difficult-to-easy foxa the average difficulty of the iteas not 
reached (as estiaated froa data on the group who finished the difficult- 
to-easy foxa) was .50. If the 13 subjects had perfoxaed to the 
average level of the group finishing the foxa, the average score on 
the difficult-to-easy order would have increased froa 9.96 to 10.66. 

Thus the differences in performance on the two forms would be reduced 
froa 1.45 correct answers to .93 correct answers. It is likely however 
that the 21 subjects who failed to finish the test were the poorer 
mathematics students than the ones who finished; hence their chances 
of perfoxaing as well as on the unfinished items as those subjects who 
did finish the test was reaote. Thus the difference in perfoxaance 
on the two forms would probably have been very auch closer to 1.45 
correct answers than to .93 correct answers had sufficient tiae been 
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allowed for all subjects to complete the test. Thus the speededness 
of the difficult-to-easy test form does not appear to be a plausible 
explanation of the obtained difference in average performance between 
the easy-tO“difficult and difficult-to-easy forms. 

Effect Of It em Order on Stress Scores 

The raeans and standard deviations of stress scores and pre-test 
stress scores under the two item orders arc summarized in Table 2. 

The group administered the easy-to-difficult order (N « 24) had an 
average stress score of 75,20 whereas the group administered the 
reverse order (N » 24) had an average stress score of 76.48. (The 
number of subjects in this part of the study was reduced from 106 to 48 
because of a malfunction in the pulsemeter during the testing of 
students in one of the two schools. When using 106 subjects with a 
difference of 2 between the stress score means, the power of the F test 
for rejecting the hypothesis of no differences between the means was ,70. 
However because only 48 subjects were used in the analysis, for the 
same difference in stress score means, the power of the F test dropped 
to .45. In order to maintain the power at approximately .70, the .10 
significance level was adopted for testing the difference between 
stress scores under the two forms.) 

It is clear from Table 2 that prior to the administration of the 
test there were differences in the average pre-test stress scores of 
the experimental groups. For this reason the difference in the stress 
scores of the experimental groups was tested using the method of 
analysis of covariance with the pre-test stress score used as the 
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TABLE 2 



MEANS AND STANDARD DEVIATIONS OP STRESS SCORES 



Test 


N 


1 =-r=^-.=-.v r 

Stress Scores 


Pre-Test Stress Scores 


Mean 


S.D. 


Mean 


S.D. 


Easy-to-Difficult 

Order 


24 


75.20 

• 


6.70 


80.29 


9.33 


Difficult-to-Easy 

Order 


24 


76.48 

1 


5.00 


77.71 


8.14 
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covariate. The procedure followed was that outlined by Gulliksen 
and Wilks (19S0). 

The results of the analysis of covariance are sunonarized 
in Table 3 and may be explained in the following way. 

The analysis of covariance method of Gulliksen and Wilks 
(1950) tests three statistical hypothesis.^ The first is a hcxnogeneity 
of variance hypothesis. It states that the variance of stress scores 
about the regression line of stress scores on pre-test stress scores 
is the same for the two experimental groups. In Table the statistic 
that tests this hypothesis, (Hj) * is less than the critical value 
at the .10 level, thus the observed data do not contradict the 
homogeneity of variance hypothesis. 

The second statistical hypothesis was. then tested. Hypothesis 
two is that the slope of the regression of stress scores on pre-test 
stress scores is the same for each experimental group. In effect, 
hypothesis two asserts that there is no Interaction between the effect 
of item order and the level of pre-test stress scores. In Table 3, 
the statistic that tests this hypothesis is V OI 2 ) . Since the 
observed value F (H^) is less than the critical value at the .10 level, 
the data fail to contradict the hypothesis of equal regression slopes. 

The analysis of covariance was completed by testing the third 
statistical hypothesis. It states that the intercept of the regression 
of stress scores on pre-test stress scores is the same for each 

prior assumption underlying the use of the analysis of 
covariance techniques is that in each experimental group, the regression 
of criterion on predictor scores is linear. The results of a linearity 
of regression test failed to reject the hypothesis that the regression 
of stress scores on pre-test stress scores was linear. 
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experimental group. This hypothesis asserts that there is no effect 
of item order on stress scores. In Tab!© 3^ the statistic that tests 
this hypothesis is I’ (H^) . The probability of the occurence by 
chance of an F value greater than the one observed using a 1- tailed 
test^ is approximately .06. Thus the hypothesis that the regression 
lines had equal intercepts can be rejected at the .10 level of signifi- 
cance. The adjusted stress scores mean on the easy-to-difficult item 
order was 74.71. For the reverse order, the adjusted stress score 
mean was 76.97. 

Tlie very tentative conclusion is drawn here that the difficult- 
to-easy item order produced a more stressful test situation than the 
reverse item order. This conclusion is stated tentatively because of 
the failure to observe a conventional level of significance. What is 
obviously required is additional research to establish the conclusion 
more firmly. 



Interaction of Item Order and Test Anxiety 

It.e Achievcirent Anxiety Test gives both a facilitating; m<\ 
a debilitating nr.xiety .<icore. Since this part of the study was 
concerned with the negative effects of test anxiety on test performance 
under varying degrees of stress, only the latter score used in 

the analysis. 

On the basis of the debilitating anxiety scores the 100 subjects 
were divided into two groups, high te.st anxious (HTA, N » 25) and low 

^Since the purpose of the analysis of covariance was to determine 
whether the hypothesized directional difference between stress scores 
of the experimental groups is supported by the data, a one-tailed test 
of significance was appropriate (Jones, 1954). 
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test mxious (LTA, N • 25). The HTA and LTA grotq>s included the 
upper end lower 25% of the sanple. 

The results for HTA and LTA subjects on the two foms of the 
test were sonewhat surprising. The HTA group taking the difficult* 
to-easy foxm of the test (N • 12) averaged 10.33 correct answers on 
the test, whereas the LTA group on the sane fom (N ■ 12) averaged 
only 9. OS correct answers. On the easy-to-difficult fom of the test, 
the HTA group (N ■ 13) averaged 10.00 correct answers. The LTA groiqp 
on the sane fom (N « 13) averaged 11.77 correct answers. 

The test for the significance of differences in test perfomance 
was carried out using a two factor analysis of variance design. In 
this analysis, the two factors were iten order and anxiety. Of special 
interest in the analysis was the interaction between the two factors. 

The analysis of variance is sunarised in Table 4. The nain effects 
due to item order and anxiety were not significant. The interaction 
effect also failed to attain the .05 level of significance. The 
conclusion must be that the data of this study provides no evidence 
to support the hypothesis that the difference in perfomance between 
high and low test anxious sid>jects would in general be greater on the 
difficult-to-easy order than the reverse order. 
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TAHLF 4 

AN’ALYSIS OF VARIANCF TAULU f OR 

MAriii-.’'tATu:s TEST sconns 

(rtem Order X Anxiety) 
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Note. — The number of subjects in each cell of the design was either 
12 or 13. 




Discussion 



This study provides additional clear-cut support for the 
contention that item order has an effect on test perfomance. This 
study generalizes the conclusion of previous research to the content 
dowain of matheiiatics. It was found that scores on the easy-to- 
difficult iten order were on the average significantly higher than 
scores on the difficult-to-easy order. 

In view of the failure to find an interaction between iten 
order and test anxiety it seems clear that the personality characteristic 
of test anxiety cannot be used to explain the difference in performance 
on the two item orders. However it is possible to speculate that the 
concept of 'response sets' in testing provides an explanation. 

Cronbach (1950) stated that when a person takes a test, he brings 
to the test a number of test-taking habits or response sets which 
affect his score. Response sets such as the tendency to work for speed 
rather than accuracy and the tendency to guess when uncertain are well 
known for their effect on test scores. Although the expectancy that 
any achieveff:ent test will begin with easy items was not conceived as a 
'response set* by Cronbach (1946), it is possible to regard this 
exp.lectancy as such. Moreover because it is common to order items from 
easy to difficult, a set to expect items to be ordered in that way may 
be present in grade 11 students. When a subject with such an expectancy 
encounters difficult items early in a test, he expects even more difficult 
items later on which makes him more anxious with the likely result that 
test performance is adversely affected. This explanation gains support 
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from the second part of the study in which it was found that a 
difficult-to-easy order of test items produced a more stressful 
test situation for subjects than the reverse order of test items. 

It was sug(;estcd by Cronbach (1946) that if a particular 
'response set' exists in a test situation, the test directions can 
be revised to reduce the effect of the set. Further research is 
needed to detorminc if manipulation of test directions would reduce 
the observed decrement in test performance produced by administering 
items in the order difficult to easy. 

The tentative conclusion that stress scores were higher for 
subjects on the difficult-to-easy order than the reverse order provides 
some indication of the Importance of item order on the stress generated 
during a test. Clearly this point deserves to be researched additionally 
to achieve more conclusive evidence than was obtained in the present 
study. 

Finally, the importance of this study to test constructors seems 
to be the evidence it provides for the discontinuation of the practice 
of making the order of presentation of items in a test different for 
different examinees to reduce the chance of cheating. It is clear on 
the basis of currently available evidence that reordering the items of 
a test in effect produces a test with different properties than the 
original. Hence it may be impossible to make valid comparisons of the 
scores obtained by students who take the same test items in a different 
order. This conclusion is in agreement with the conclusion reached by 
Flaugher, Melton and Myers (1966). 



o 




1 



m 






Summary 



The objectives of this research were to investigate the 
effect of item order on the performance of a mathematics test; on 
the amount of stress generated during a test; and on the performance 
of high and low test anxious subjects. 

106 high school students completed the Achievement Anxiety 
Test. Two weeks later, they were randomly assigned to one of two 
treatment groups. Subjects in one group were administered a 
standardized mathematics achievement test with the items ordered from 
easy to difficult. Subjects in the second group took the same test 
with the order of items reversed. A physiological indicant of stress, 
heart-rate, was measured three times during the test using a pulsemeter. 
The three heart-rate measures for each subject were averaged to obtain 
a stress score. 

Results of this study confirmed the finding of other researchers 
that the mean number of correct answers for test questions arranged 
in the difficult-to-easy order were significantly lower than the mean 
number of test questions arranged in the reverse order. This study 
generalizes the previous result to the content domain of mathematics. 

In addition, this study provides tentative support for the hypothesis 
that item order has an effect on the stress generated during a test. 

This point deserves to be researched additionally to achieve more 
conclusive evidence than was obtained in this study. i Lastly, the data 
failed to support the hypothesis of an interaction between item order 
and level of test anxiety. 
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